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Abstract 

We consider structural equation models (SEMs) in which variables can be written as a 
function of their parents and noise terms (the latter are assumed to be jointly independent). 
Corresponding to each SEM, there is a directed acyclic graph (DAG) Go describing the 
relationships between the variables. In Gaussian SEMs with linear functions, the graph can 
be identified from the joint distribution only up to Markov equivalence classes (assuming 
faithfulness). It has been shown, however, that this constitutes an exceptional case. In 
the case of linear functions and non-Gaussian noise, the DAG becomes identifiable. Apart 
from few exceptions the same is true for non-linear functions and arbitrarily distributed 
additive noise. In this work, we prove identifiability for a third modification: if we require 
all noise variables to have the same variances, again, the DAG can be recovered from the 
joint Gaussian distribution. Our result can be applied to the problem of causal inference. If 
the data follow a Gaussian SEM with same error variances and given that all variables are 
observed, the causal structure can be inferred from observational data only. 



1 Introduction 



1.1 Graphical and Structural Equation Models 

For random variables X\, . . . , X p , we define a graphical model as a pair (Q, £(X)) with a joint 
probability distribution £(X) = £(Ai , . . . ,X P ) that is Markov with respect to a directed 



acyclic graph (DAG) Q Lauritze"nl . 1996J . Structural equation models (SEM) (also referred 



to as a functional models) are related to graphical models. They are specified by a pair 
{S, £(N)), where S — {Si, . . . , S p } is a collection of p equations 

Sj: X j = f j (X PA .,N j ) (1) 

and a joint distribution £(N) = C(Ni, . . . , N p ) of the noise variables. We require the noise 
terms to be jointly independent, which means £(N) is a product distribution. The graph Q 
of an SEM is obtained by drawing directed edges from each variable Xk, k G PA 3 occurring 
on the right-hand side of equation (JTJ to Xj, and is required to be acyclic. A"pa coincides 
with the set of parents of Xj in graph Q, see Section [2] for notation. Furthermore, given 
an SEM (S, £(N)), the joint distri bution £(X) is fully determined. £(X) is Markov with 



respect to the graph Q [Pearl 120091 Theorem 1.4.1 
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1.2 Identifiability from the Distribution 



We address the following problem: Given a joint distribution £(X) = C(Xi, . . . ,X p ) from 
a graphical model (or from an SEM) with DAG Go, can we recover the graph Q ? By first 
considering graphical models one can easily see that the answer is negative: The joint dis- 
tribution £(X) is certainly Markov with respect to a lot of different DAGs, e.g. to all fully 
connected DAGs. Thus, there are many possible graphical models (Q, £(X)) for the same 
jC(X). Similarly, there are SEMs with different structures that could have generated the 
distribution £(X). 

What can we do to overcome this indeterminacy? The hope is that by using additional as- 
sumptions one obtains restricted graphical models and restricted structural equation models 
for which the graph is identifiable from the joint distribution. In our opinion, it is precisely 
here, where the difference between graphical and functional models becomes apparent: 
For graphical models it has been suggested to assume faithfulness, that is each conditional 
independence found in £(X) is implied by the Markov condition. If faithfuln ess holds, it is 



prove n that one can obtain the Markov equivalence graph of the true DAG Go Spirtes et al. 
2000]. But the Markov equivalence class may still be large [cf. lAndersson et all Il99?| and 
the DAG Go is n ot identifiable. Furtherm ore, faithfulness in its full generality cannot be 
tested from data Zhang and Spirtes! . 2008l |. Since both assumptions (Markov condition and 
faithfulness) put restrictions only on the conditional independences in the joint distribution, 
it is not surprising that two graphs entailing exactly the same conditional independences 
cannot be distinguished. 

Structural Equation Models enable us to exploit a different type of restrictions. First, a 
Gaussian SEM is equivalent to a Gaussian graphical model (Go, £(X)), and hence, the struc- 
ture Go is not identifiable from C(X). Recently, however, it has been shown that this case 
is exceptional in the following sense: (i) If we c onsider linear f u nction s and non-Gaussian 



noise, one can identify the underlying DAG Go [Shimizu et all l2006l |; (ii) if one restricts 



the functions to be additive in the noise component and excludes the linear Gaussian case 
(as well as a few other pathologica l function- noise combinat ions), one can show that Go is 
identifiable from £(X) [Hover et al. . 20091 Peters et alll201lj |. In this work, we prove a third 
direction of deviating from the general linear Gaussian case. Namely, (iii) Gaussian SEMs 
where all functions are linear, but the normally distributed noise variables have the same 
variances er 2 , are again identifiable. 

Our result may come as a surprise that for a class of Gaussian SEMs the underlying DAG is 
identifiable. The assumption of same error variances seems natural for a range of applications 
(with variables from a similar domain) and is commonly used in time series models. 



1.3 Causal Interpretation 

Our result has implications for causal inference. If Go is interpreted as the causal graph of 
the data generating process for Xi, . . . , X p , the problem considered here is to infer the causal 
structure from the joint distribution. This is particularly interesting when the causal graph 
is of interest, but interventional experiments are too expensive, unethical or even impossible 
to perform. In this causal setting, our result reads: "If the observational data is generated by 
a Gaussian SEM that represents the causal relationships and has the same error variances, 
then the causal graph is identifiable from the joint distribution." Despite this potentially 
important application, we present the statement and its proof without causal terminology. 
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2 Identifiability for Gaussian Models with Same Error 
Variances 



We first introduce some notation we are going to use in the remainder of this article. The 
index set J := {1, . . . ,p} corresponds to a set of vertices in a graph. Associated with j G J 
are random variables Xj £ X. Given a DAG Q, we denote the parents of a node j by PA?, 
the children by CH^, the descendants by DE^ and the non-descendants are denoted by 
NDf. 

We now formally specify our model. Let X = {X\, . . . ,X p } be a finite set of variables. 
We consider an SEM (with DAG Qq) of the form 

X 3 = £ /3 jk X k + Nj, (2) 



where all Nj ~ A^(0,<7 2 ) with a 2 > 0. Additionally, for each j 6 {l,...,p} we require 
/^OVfcePAfo. 

Theorem 1 Let £(X) be generated from model (|2l). Then all coefficients can be recon- 
structed from £(X) ■ I n particular, Go is identifiable. 

Remark 1 (Faithfulness and Causal Minimality) Note that Theorem\T\ and the iden- 
tifiability results (i) and (ii) from Section \1.2\ assume causal minimality, a weak form of 
faithfulness. From our point of view, causal minimality is as natural as the Markov con- 
dition and is in accordance wit h the intuitive understand ing of a causal influence between 
variables. In its original form, \Zhana and Svirtes} \2004 l define causal minimality as fol- 
lows: Let Qq be the true causal graph. Then £(X) is not Markov to any proper subgraph of 
Go- In the linear Gaussian case, causal minimality is implied by non-vanishi ng coefficients 
0ik Vfc G PAj° ■ This follows from Lemma^4\ (below) and Proposition 2 in \Peters et all . 
201 A l. 

In Section \1.2\ we m entioned that methods bas ed on conditional independence tests usually 
assume faithfulness. \Zhana and Svirte 1 \200A l show that given the Markov condition and 
causal minimality some violations of faithfulness are detectable. They call the non- detectable 
part triangle faithfulness, which is still stronger than causal minimality. 

Remark 2 (Error Covariance with Unknown Scaling) Theorem Q] can be generalized 
to the case, where the error covariance matrix has the form 

S N = a 2 x diag(CT^,...,CT 2 ) 

with pre-specified a 2 , . . . , a 2 and unknown scaling a 2 . 



3 Conclusions 

We have shown that a Gaussian SEM with same error variances is identifiable from the 
distribution. In particular, the corresponding DAG is identifiable while for general Gaussian 
SEMs we can at best identify the Markov equivalence class only (assuming faithfulness). The 
assumption of same error variances constitutes an interesting alternative to the restrictions 
of non-linear functions and non-Gaussian noise. 

Estimation of Gaussian SEMs with same error variances ca n be done using maximum likeli- 



hood with the BIC-penalty, in analogy to IChickeringl [2002{ , where the search is done in the 



space of DAGs rather than Markov equivalence classes. 
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4 Proof 



4.1 Some Lemmata 

In the following two sections we consider different subsets of the set of variables X: to 
simplify notation we do not distinguish between indices and variables anymore since the 
context should clarify the meaning. This way, we can also speak of the parents PA^ of a 
variable B G X. We also consider sets of variables S C X as a single multivariate variable. 
The following four statements are all plausible and their proof is mostly about technicalities. 
The reader may skip to Section [4.21 and use the lemmata whenever needed. 

Lemma 1 Let (A%, . . . ,A m ) ~ 7V((/ii, . . . , /i m ) T , S) with strictly positive definite E and 
define A\ :— A\ | (A 2 ,...,A m )=(ao....,a m ) ■ Then, for all (a2, . . . , a m ) G M m_1 it holds 

varAl < varAi . 

Proof. Let us decompose E into 



E = 



^12 



Ej.2 E 22 

with E 12 being an (m — 1) x 1 vector. Then 

varAl = of - Ef 2 ■ ^22 ' s i2 < °l 
since EC^ 1 is positive definite. □ 

Lemma 2 (jPeters et al.l |201l| ) Let Y £ y, N £ J\f,Q £ Q,K £ TZ be random variables 
whose joint distribution is absolutely continuous with respect to some product measure (Q 
and R can be multivariate) and with density PY,Q,n,N(y, q, r : n )- Let f'.yxQxAf^-M. be 
a measurable function. If N _1L (Y, Q, R) then for all q G Q, r G TZ with j>Q.R(q, r) > 0: 

f(Y, Q, TV) 1 Q=q , R=r = f{Y\ Q=q , R=r , q, N) 

Lemma 3 ( Peters et al.l |201lj ) Let £(X) be generated according to an SEM as in ([2]) 

with corresponding DAG Q and consider a variable X G X. If S C ND^ then Nx -1L S. 

Lemma 4 Let £(X) fee generated from an SEM as in ([2]) iw'i/i DAG Q. Consider a variable 
BeX and one 0/ its parents A G PA| . for all sets S ruit/i PA| \ {A} CSC ND| we 

have 

B4- A\S. 

Proof. Define Q := PA^ \ {A} such that we have S = (Q, R) for some R. Using LemmaH 
we have: 

B\ Q=q.R=r = /(q) + /3^| Q =q,R=r + 

with JVg _1L A|Q =q .R =r . But since /? 7^ 0, it follows: 

^4|Q=q,R=r -J£ S|Q =q R =r . 

□ 
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graph Q 



Figure 1: This situation is dealt with in part (ii)-2 of the proof (with S 
It contains the proof's main argument. 



graph Q' 

{5i,5 2 } andD = 0). 



4.2 Proof of Theorem [0 

The idea of the proof is as follows: We assume there are two SEMs with graphs Q and Q 1 
that lead to the same joint distribution and then deduce a contradiction. We first try to 
find variables L and Y that have the same set of parents S = {Si, S2} in both graphs, but 
reversed edges between each other in Q and Q' (as in Figure [T]). This case is treated in part 
(ii)-2 and contains the main argument of the proof. 

If wc assumed faithfulness, Q and Q' could be supposed to be Markov equivalent, which 
itself implies the existence of such an L and Y [Chickerine , 19951 Theorem 2] . Since we are 
not assuming faithfulness, proving existence of a situation similar as in Figure [TJ requires 
more work. Note that this pa rt of the proof (that is due to not assuming faithfulness) is 
taken from IPeters et al.l l201l| and r e mains almost the same. It is given here for complete- 
ness. The difference to IPeters et al.1 [201 1| is that we can prove causal minimality and do 
not have to assume it. New are also Lemmata Q] and HJ as well as the proof's main argument 
(ii)-2. We now give a formal proof. 



We assume that there are two instances of a SEM as in Theorem [T] that both induce 
£(X), one with graph Q, the other with graph Q' . We will show that Q = Q' . Since DAGs do 
not contain any cycles, we always find nodes that have no descendants (start a directed path 
at some node: after at most #X — 1 steps we reach a node without a child). Eliminating 
such a node from the graph leads to a DAG, again; we can discard further nodes without 
children in the new graph. We repeat this process for all nodes that have no children in both 
Q and Q 1 and have the same parents in both graphs. If we end up with no nodes left, the 
two graphs are identical and we are done. Otherwise, we end up with two smaller graphs 
that we again call Q and Q' and a node L that has no children in Q and either PA^ ^ PA£ 
or CH^ 7^ 0. We will show that this leads to a contradiction. Importantly, because of the 
Markov property of Q, all other nodes are independent of L given PA^: 

L±X\(PAlu {L}) I PAg . (3) 

To make the arguments easier to understand, wc introduce the following notation (see 
also Figure^]): We partition (/-parents of L into Y, Z and W. Here, Z are also (/'-parents 
of L, Y are (/'-children of L and W are not adjacent to L in Q' . We denote with D the 
(/'-parents of L that are not adjacent to L in Q and by E the ^'-children of L that are not 
adjacent to L in Q. Thus: PA^ = Y U Z U W, CH^ = 0, PAf = Z U D, CH^' = Y U E. 
Consider T := W U Y. We distinguish two cases: 



Case (i): T = 0. 

Then there must be a node D G D or a node £gE, otherwise L would have been discarded. 

1. If there is a D E D then {3) implies L _1L D \ S for S := Z U D \ {£>}, which contradicts 
Lemma |4] (applied to Q'). 
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part of Q part of Q' 

Figure 2: Nodes adjacent to L in Q and 



2. If D = and there is E G E then £ JL L | S holds for S:=ZU PA|' \ {©}, which also 
contradicts Lemma 2] (note that Z C ND| to avoid cycles). 
Case (ii): T ^ 0. 

Then T contains a "(/'-youngest" node with the property that there is no directed C/'-path 
from this node to any other node in T. This node may not be unique. 

1. Suppose that some W G W is such a youngest node. Consider the DAG Q' that equals 
Q' with additional edges Y ->• W and W -+ W for all Y G Y and W' G W \ {IF}. 
In Q' L and IF are not adjacent. Thus we find a set S such that S d-separates L and 
W in (?'; indeed, one can takeQ S := (CHf U PA e '(CH£')) \ (U U DE e '(U)) with 
U = CHf n CH|. Then also S = SU{Y,Z,W\ {IF}} d-separates L and W in 0'. 
Indeed: All Y G Y are already in S in order to block L — > Y — > VT. Suppose there is 
a C?'-path that is blocked by S and unblocked if we add Z and W nodes to S. How 
can we unblock a path by including more nodes? The path (L ■ ■ ■ Vi • ■ ■ U\ ■ ■ ■ W in 
Figure[3]) must contain a collider V\ that is an ancestor of a Z with Vi, . . . , V m , Z ^ S 
and corresponding nodes Ui for a IF' node. Choose V\ and C/i on the given path 
so close to each other such that there is no such collider in between. If there is no 
Vi, choose U\ close to L, if there is no £© choose V\ close to W. Now the path 
L 4— Z ■ ■ ■ Vi ■ ■ ■ U\ ■ ■ ■ W — >• W is unblocked given S, which is a contradiction to the 
assumption S <i-separates L and W. 
But then S <i-separates L and W in Q' , too (there are less paths), and we have L JL W | S 
which contradicts Lemma |4] (applied to Q). 




Figure 3: Assume the path L ■ ■ ■ V\ ■ ■ ■ U\ • • • W is blocked by S, but unblocked if we include Z 
and W . Then the red path is unblocked given S. 

2. Therefore, the ^'-youngest node in T must be some Y G Y. 

First, note that 

a"g = Ogi = min var(X) = a 2 (4) 



x By PA^(B) for some set B C X we denote the union of all parents: Uspb 
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We define S := PA^ \ {Y} U D. Clearly, S C ND^ since L does not have any 
descendants in Q. Define Q := PA^ \ {Y} and take any s = (q, d). Define 

L*:=L ls=s and Y*~Y {S=S 

Then, from Q and Lemma [5] we find 

L* = f L (q,Y*) + N L , N L ±Y {S=S 
= f(q)+/3-Y* + N L , N L ±Y {S=S 

Note that the independence holds because of S C ND£ . Then, we have 

var(L*) = /3 2 var(T*) + a 2 > a 2 . (5) 

Since PA^ C S we find from Q' and Lemma [T] that 

var(L*) < a 2 . (6) 

(Note that det(cov(X)) ^ 0.) Equations ([5]) and © contradict each other. 

In order to prove Remark [2] replace var(X) by var(X)/cr^ in ((4]) and a 2 by a 2 ■ a\ in 
equations (|3J) and ©. □ 
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